Catch trial performance

## [1] "Excluded 31 participants based on catch-trial performance."

Exclusion of random guesses

We further exclude participants who seem to provide random ratings independent of the scene that they are seeing. We quantify this by computing the mean rating for each utterance across all trials for each participant and computing the correlation between a participant’s actual ratings and their mean rating. A high correlation is unexpected and indicates that a participant chose ratings at random. We therefore also exclude the data from participants for whom this correlation is larger than 0.75.

## [1] "Excluded 1 participants based on random responses."

Aggregated results

Individual responses

AUC computation

We use the AUC function with the splines method to directly compute the AUC.

##  [1] -15.673873  -7.634082  27.689755  39.061802  28.342940  24.839464
##  [7]  -3.903503  48.438740  34.221671  35.007151  42.933573  12.497181
## [13]  14.625947  15.838332  25.898586  -4.608395   7.694485  24.713868
## [19]  13.918289  18.356880   7.750705  14.625947  50.563930   1.911270
## [25]   4.667133  22.560162  12.927865  33.262202  34.221671  51.030906
## [31]  27.689755 -31.089861  67.664372  18.027560  65.640823 -38.306152
## [37]  34.221671  -2.754073  -5.656658 -12.434285  48.337958  12.300371
## [43]   3.259558  30.793440  -1.506618  19.851896  86.062019  36.106809
## [49]   1.796936  -2.209590  -0.298197  13.467697  17.269121  57.368724
## [55]  61.505673  32.070449  17.833441  -9.585335  -9.244998  30.517922
## [61]  24.162369   6.955771 -20.019694  34.221671  38.453389  29.339265
## [67] -34.846237  12.615582  26.187437   1.748617  42.418521 -46.613823
##  [1]  -0.04258585  15.85585453  34.22167115 -60.66013011  34.22167115
##  [6]   6.83561287 -24.31732431   2.75461904  43.39936050 -42.74204299
## [11] -41.09733902  15.92309376   4.04704308  16.42078182  21.28083703
## [16]   5.68868015  39.97115108  -2.35709708   6.65250894  -7.87378669
## [21]  27.68975489   6.53177206  15.82300178   0.93133264  12.63474667
## [26]  13.62088092   1.65556374   0.91110730 -34.22167115   8.97860471
## [31]  18.82493447 -24.57437786  -0.32704894  11.34579805  57.78603952
## [36] -24.61519416  16.05335923 -27.86721096  -3.26253625 -27.47910903
## [41]   1.00971494  25.81779479  -5.20972611   5.42697871 -12.67403905
## [46]  16.76981730 -54.34819214  31.66801769  -2.40390717 -13.39404036
## [51]  -4.12129641  -4.01584329   0.59617160  57.78603952  62.21260505
## [56]  38.64821711 -13.71704203 -68.24809490  -8.28171618  37.18261243
## [61]  27.35622313  -1.77413452 -22.23299260  34.22167115  13.85089559
## [66]  34.22167115   7.23460603   6.08361631  22.31985215  41.46056847
## [71]  45.00124944 -38.72330819

t-test and regression model with control variables:

## 
##  Two Sample t-test
## 
## data:  aucs.cautious$auc_diff and aucs.confident$auc_diff
## t = 2.9276, df = 142, p-value = 0.00398
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   4.153181 21.423418
## sample estimates:
## mean of x mean of y 
##  18.04311   5.25481
## 
## Cohen's d
## 
## d estimate: 0.487931 (small)
## 95 percent confidence interval:
##    lower    upper 
## 0.153596 0.822266
## 
## Call:
## lm(formula = auc_diff ~ cond + test_order + first_speaker_type + 
##     confident_speaker, data = rbind(aucs.cautious, aucs.confident))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -75.676 -14.133   0.127  16.111  72.716 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                        15.558      4.876   3.191  0.00175 **
## condconfident (probably-biased)   -12.788      4.384  -2.917  0.00412 **
## test_orderreverse                  -2.212      4.390  -0.504  0.61504   
## first_speaker_typeconfidentfirst    5.686      4.393   1.294  0.19771   
## confident_speakerconfidentm         1.185      4.393   0.270  0.78771   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 26.3 on 139 degrees of freedom
## Multiple R-squared:  0.07014,    Adjusted R-squared:  0.04338 
## F-statistic: 2.621 on 4 and 139 DF,  p-value: 0.0375
## Analysis of Variance Table
## 
## Model 1: auc_diff ~ cond
## Model 2: auc_diff ~ cond + test_order + first_speaker_type + confident_speaker
##   Res.Df   RSS Df Sum of Sq      F Pr(>F)
## 1    142 97543                           
## 2    139 96176  3    1367.1 0.6586 0.5789